Milestone | Planetary Weather Data Analysis¶

NASA Logo

Introduction¶

In this Milestone, you'll role play as an intern at NASA, where you'll work to analyze weather data collected from a planetary rover. The dataset, planet_weather.csv contains atmospheric measurements from a planet in our solar system, but the planet's identity has not been disclosed. Your objective is to apply data inspection, cleaning, and analysis techniques to draw conclusions about the planet based on the provided weather data.

The dataset, located in the datasets/ folder is called planet_weather.csv, and it contains the following information:

  • terrestrial_date: Date on planet Earth, captured as yyyy-mm-dd.

  • sol: number of elapsed planetary days since beginning measurement.

  • ls: solar longitude. 0: fall equinox. 90: winter solstice. 180: spring equinox. 270: summer solstice.

  • month: the month number on the mystery planet.

  • min_temp: the minimum temperature, in Celsius, during a single day.

  • pressure: atmospheric pressure, in Pascals

  • wind_speed: average wind speed, in meters per second.

  • atmo_opacity: atmospheric opacity.

To start, import both the pandas and plotly.express libraries, and load the data into a DataFrame.

In [1]:
# import pandas and plotly express libraries
import pandas as pd
import plotly.express as px
In [2]:
# load planet_weather.csv data from datasets folder
weather = pd.read_csv('datasets/planet_weather.csv')

Task 1: Data Inspection 🔍¶

Before analyzing anything, it's essential to understand the structure and contents of your dataset. You'll start by previewing the data and checking for missing values or unusual patterns.

In [3]:
# preview the data
weather.head()
Out[3]:
id terrestrial_date sol ls month min_temp max_temp pressure wind_speed atmo_opacity
0 1895 2018-02-27 1977 135 Month 5 -77.0 -10.0 727.0 NaN Sunny
1 1893 2018-02-26 1976 135 Month 5 -77.0 -10.0 728.0 NaN Sunny
2 1894 2018-02-25 1975 134 Month 5 -76.0 -16.0 729.0 NaN Sunny
3 1892 2018-02-24 1974 134 Month 5 -77.0 -13.0 729.0 NaN Sunny
4 1889 2018-02-23 1973 133 Month 5 -78.0 -18.0 730.0 NaN Sunny

1. How many rows and columns are there in the dataset?

In [4]:
# dataset rows and columns
print(weather.shape)
(1894, 10)

2. What are the names of all the columns?

In [5]:
# dataset columns
weather.columns
Out[5]:
Index(['id', 'terrestrial_date', 'sol', 'ls', 'month', 'min_temp', 'max_temp',
       'pressure', 'wind_speed', 'atmo_opacity'],
      dtype='object')

3. What is the data type of each column?

Hint: There are a few ways to do this! Both .info and .dytpes will work.
In [6]:
# data types of each column
weather.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1894 entries, 0 to 1893
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                1894 non-null   int64  
 1   terrestrial_date  1894 non-null   object 
 2   sol               1894 non-null   int64  
 3   ls                1894 non-null   int64  
 4   month             1894 non-null   object 
 5   min_temp          1867 non-null   float64
 6   max_temp          1867 non-null   float64
 7   pressure          1867 non-null   float64
 8   wind_speed        0 non-null      float64
 9   atmo_opacity      1894 non-null   object 
dtypes: float64(4), int64(3), object(3)
memory usage: 148.1+ KB

4. How many null values are there in each column? For each column, sum up the number of null values.

In [7]:
# null values in each column
weather.isnull().sum()
Out[7]:
id                     0
terrestrial_date       0
sol                    0
ls                     0
month                  0
min_temp              27
max_temp              27
pressure              27
wind_speed          1894
atmo_opacity           0
dtype: int64

5. Provide a statistical summary of the DataFrame.

In [8]:
# Statistical summary of the DataFrame
weather.describe()
Out[8]:
id sol ls min_temp max_temp pressure wind_speed
count 1894.000000 1894.000000 1894.000000 1867.000000 1867.000000 1867.000000 0.0
mean 948.372228 1007.930306 169.180570 -76.121050 -12.510445 841.066417 NaN
std 547.088173 567.879561 105.738532 5.504098 10.699454 54.253226 NaN
min 1.000000 1.000000 0.000000 -90.000000 -35.000000 727.000000 NaN
25% 475.250000 532.250000 78.000000 -80.000000 -23.000000 800.000000 NaN
50% 948.500000 1016.500000 160.000000 -76.000000 -11.000000 853.000000 NaN
75% 1421.750000 1501.750000 259.000000 -72.000000 -3.000000 883.000000 NaN
max 1895.000000 1977.000000 359.000000 -62.000000 11.000000 925.000000 NaN

6. Based on the dataset’s shape, column types, and statistical summary, what initial observations can you make about the data’s structure, completeness, and potential quality issues?

Try This AI Prompt: Summarize key things a data analyst should look for when reviewing .shape, .info(), and .describe() outputs from a new dataset.

The dataset contains 1,894 rows and 10 columns, with a mix of numeric, categorical, and date-related variables describing planetary weather. Most columns are complete, but min_temp, max_temp, and pressure each have 27 missing values, while wind_speed is completely empty and likely unusable. The terrestrial_date column is stored as an object and should be converted to a datetime format for time-based analysis, and the month column may need cleaning due to its "Month X" format. The temperature and pressure values suggest a very cold, low-pressure environment, possibly indicating Mars. Overall, the dataset is mostly clean, but minor data type corrections and handling of missing values are needed before deeper analysis.

Task 2: Data Cleaning¶

Now that you’ve inspected the data, your next step is to check for any columns that might need to be cleaned or removed.

1. Are there any columns with mostly missing values? Perhaps the wind speed sensor was broken! Remove this column from the dataframe.

In [9]:
# Delete wind_speed column, which is filled with null values
weather.drop('wind_speed', axis = 1, inplace = True)

2A. Now, check for columns that might not add much value to your analysis.

Are there any columns where almost every value is the same?

Take a close look at the atmo_opacity column. How many unique values are there? How frequent are they?

Hint: Both .nunique and .value_counts will work here!
In [10]:
# How many unique values are there in the atmo_opacity column?
print(weather['atmo_opacity'].nunique())

# Show the frequency of each value
print(weather['atmo_opacity'].value_counts())
2
Sunny    1891
--          3
Name: atmo_opacity, dtype: int64

2B. The atmosphere sensors seem to have been faulty and did not capture accurate data. Drop this column, which contains identical values.

In [11]:
# Drop the atmo_opacity column
weather.drop('atmo_opacity', axis = 1, inplace = True)

Task 3: Data Analysis & Visualization¶

Let’s explore trends in the planetary weather using groupings and charts. This will help you uncover seasonal patterns and key environmental characteristics of the mystery planet.

You’ll need to use the .groupby() method here to analyze how temperature and pressure vary across months. Store your grouped results in new DataFrames to make them easier to visualize.

1. How many months are there on this planet?

In [12]:
# Number of unique months
print(weather['month'].nunique())
12

2A. What is the average minimum temperature for each month?

In [13]:
# Average min_temp each month
avg_min_temp_by_month = weather.groupby('month').agg({'min_temp' : 'mean'}).reset_index()
print(avg_min_temp_by_month)
       month   min_temp
0    Month 1 -77.160920
1   Month 10 -71.982143
2   Month 11 -71.985507
3   Month 12 -74.451807
4    Month 2 -79.932584
5    Month 3 -83.307292
6    Month 4 -82.747423
7    Month 5 -79.308725
8    Month 6 -75.299320
9    Month 7 -72.281690
10   Month 8 -68.382979
11   Month 9 -69.171642

2B. Using your grouped results above to plot a bar chart of the average minimum temperature by month.

In [14]:
# Bar chart of the average min_temp by month
fig = px.bar(avg_min_temp_by_month,
      x = 'month',
      y = 'min_temp',
      title = 'Average Minimum Temperature by Month',
      labels = {'month':'Month', 'min_temp':'Avg Min Temp (C)'},
      color = 'min_temp',  # Adds color scale based on temperature values
      color_continuous_scale='RdBu') # Choose a color scale

fig.update_layout(
    title={'x': 0.5},  # Centers the title horizontally
    coloraxis_colorbar=dict(title='Temp (C)')  # Rename color bar
)

fig.show()
Try This AI Prompt: In my Plotly Express bar chart, the x-axis shows months but they're sorted alphabetically (Month 1, Month 10, Month 11, etc.) instead of Month 1 - Month 12 order. How can I replot my data so that the x-axis sorts numerically by month?

2C. Based on the minimum temperature, what is the coldest month? What is the warmest month?

The coldest month based on average minimum temperature is Month 3, while the warmest month is Month 8.

3A. What is the average pressure for each month?

In [15]:
# What is the average pressure for each month?
avg_pressure_by_month = weather.groupby('month').agg({'pressure': 'mean'}).reset_index()
print(avg_pressure_by_month)
       month    pressure
0    Month 1  862.488506
1   Month 10  887.312500
2   Month 11  857.014493
3   Month 12  842.156627
4    Month 2  889.455056
5    Month 3  877.322917
6    Month 4  806.329897
7    Month 5  748.557047
8    Month 6  745.054422
9    Month 7  795.105634
10   Month 8  873.829787
11   Month 9  913.305970

3B. Using your grouped results above to plot a bar chart of the average atmospheric pressure by month.

In [16]:
# Bar chart of the average atmospheric pressure by month
fig = px.bar(
    avg_pressure_by_month,
    x='month',
    y='pressure',
    title='Average Atmospheric Pressure by Month',
    labels={'month': 'Month', 'pressure': 'Avg Pressure (Pa)'},
    color='pressure',
    color_continuous_scale='Viridis'
)

# Center the title
fig.update_layout(title={'x': 0.5})

fig.show()

4. Plot a line chart of the daily atmospheric pressure by terrestrial date.

Note: You do not need to modify the dataframe or group any data! Just use the original data.
In [17]:
# Line chart of the daily atmospheric pressure by terrestrial date
fig = px.line(
    weather,
    x='terrestrial_date',
    y='pressure',
    title='Daily Atmospheric Pressure Over Time',
    labels={'terrestrial_date': 'Date', 'pressure': 'Pressure (Pa)'}
)

# Center the title
fig.update_layout(title={'x': 0.5})

fig.show()

5. Plot a line chart the daily minimum temp.

Note: You do not need to modify the dataframe or group any data! Just use the original data.
In [18]:
# Line chart the daily minimum temp
fig = px.line(
    weather,
    x='terrestrial_date',
    y='min_temp',
    title='Daily Minimum Temperature Over Time',
    labels={'terrestrial_date': 'Date', 'min_temp': 'Min Temperature (C)'}
)

# Center the title
fig.update_layout(title={'x': 0.5})

fig.show()

6. Based on this information, approximately how many earth days are there in a year on this planet?

Hint: To do this, get an approximate range for how long a "year" is on this planet by looking at the plot of either the atmospheric pressure or the temperature by day. Use the visualization to get a *rough estimate* between matching "peaks".

Based on the visual patterns, there are approximately 687 Earth days in a year on this planet, which closely matches the Martian year.

7. What is the identity of the planet? Go to this wesbsite and see what planet this lines up with!

Based on the data patterns and comparison with NASA’s information, the planet is Mars.

Congratulations on making your first extra-terrestrial discovery!¶

gif of clapping

LevelUp¶

Earlier in the milestone you investigated how many months were in our Mystery Planet. Unfortunately, the answer (12) was not very satisfying. This is because there is no standard calendar for Mars. When the data was collected, they used 12 "months" though each month is longer than a typical Earth month. Let's investigate!

First, filter your dataset so that you are only looking at any terrestrial_date before 2014.

Hint: Since the terrestrial_date is a string data type, this is simply checking for all values that satisfy the condition < '2014' (don't forget the quotation marks!)
In [19]:
# filter to all values where terrestrial_date is before 2014
# store it in a new variable.
pre_2014_data = weather[weather['terrestrial_date'] < '2014']
# show dataframe
print(pre_2014_data)
       id terrestrial_date  sol   ls    month  min_temp  max_temp  pressure
1453  432       2013-12-31  499   69  Month 3     -84.0     -30.0     899.0
1454  424       2013-12-30  498   69  Month 3     -86.0     -28.0     901.0
1455  425       2013-12-29  497   69  Month 3     -86.0     -30.0     901.0
1456  428       2013-12-28  496   68  Month 3     -85.0     -26.0     901.0
1457  431       2013-12-27  495   68  Month 3     -86.0     -26.0     900.0
...   ...              ...  ...  ...      ...       ...       ...       ...
1889   24       2012-08-18   12  156  Month 6     -76.0     -18.0     741.0
1890   13       2012-08-17   11  156  Month 6     -76.0     -11.0     740.0
1891    2       2012-08-16   10  155  Month 6     -75.0     -16.0     739.0
1892  232       2012-08-15    9  155  Month 6       NaN       NaN       NaN
1893    1       2012-08-07    1  150  Month 6       NaN       NaN       NaN

[441 rows x 8 columns]
If done correctly, you will have a dataframe with 441 rows

Lastly, for each month in the dataframe, return both the min value and the max value of the terrestrial_date field.

Hint #1: You'll need to use Use .groupby()

Hint #1: In the dictionary for your .agg() function, pass a list of values. e.g. {col_to_check: [func1, func2]}. Remember to use the appropriate values / variables from the dataframe and the desired functions.
In [20]:
# For each month, calculate the minimum AND maximum terrestrial_date
date_range_by_month = pre_2014_data.groupby('month').agg({'terrestrial_date': ['min', 'max']})

# Display the result
print(date_range_by_month)
         terrestrial_date            
                      min         max
month                                
Month 1        2013-08-01  2013-10-02
Month 10       2013-02-24  2013-04-12
Month 11       2013-04-13  2013-06-04
Month 12       2013-06-05  2013-07-31
Month 2        2013-10-03  2013-12-08
Month 3        2013-12-09  2013-12-31
Month 6        2012-08-07  2012-09-29
Month 7        2012-09-30  2012-11-19
Month 8        2012-11-20  2013-01-07
Month 9        2013-01-08  2013-02-23

How many Earth days, roughly, are there in each "month" in the mystery planet? Does that lineup with what you expected now that you know the identity of the mystery planet?

On average, each “month” on the mystery planet (Mars) is about 50–65 Earth days long, which makes sense given that a full Martian year is about 687 Earth days.

In [ ]: